Submission for MIE1517 - Introduction to Deep Learning
Team 7
Taeyeon Kim, Sameeksha Naik, Kexin Qin
Dec 8. 2023
The link to this Google Colab file is below:
https://drive.google.com/file/d/1up_qnnzx-mobJznSdCRdtR-2jjF7abns/view?usp=sharing
This project explores how convolutional neural networks may be useful in classifying art styles across different time periods and cultures. This would be a useful tool for art historians as they valuate and authenticate pieces of art.
The WikiArt dataset on Kaggle was used as a starting point. This dataset features 80k images in 27 different styles. To scale down computation, five art styles were selected, taking into consideration their time period and similarity to other styles. 300-400 images of each style were selected and a custom data was created and is hosted on Google Drive with this link. The dataset has 1669 images.
The custom dataset is organized as follows:
Project_Data/
├─ Baroque/
│ ├─ annibale-carracci_two-children-teasing-a-cat-1590.jpg
│ ├─ annibale-carracci_venus-adonis-and-cupid-1590.jpg
│ ├─ ...
├─ Cubism/
├─ Minimalism/
├─ Popart/
├─ Ukiyo/
Using the split-folders package with the ImageFolder class that is native to PyTorch, the training, validation, and testing datasets can be split 60%, 20%, and 20%. The resulting file structure will look as follows:
my_dataset/
├─ train/
│ ├─ Baroque
│ ├─ Cubism
│ ├─ Minimalism
| ├─ Popart
| ├─ Ukiyo
├─ val/
│ ├─ Baroque
│ ├─ Cubism
│ ├─ ...
├─ test/
Since the model will only work when the images are the same size, the input are all transformed to be 224 x 224 pixels. The image are simply resized, not cropped or padded. This means the aspect ratio is altered.
# import necessary libraries
import torch
import torchvision
import torchvision.transforms as transforms
import torchvision.models
from google.colab import drive
import matplotlib.pyplot as plt
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import time
from torchvision import datasets, transforms
import seaborn as sns
from sklearn.metrics import f1_score, precision_recall_fscore_support, confusion_matrix
from matplotlib.table import Table
import pandas as pd
pip install split-folders
Collecting split-folders Downloading split_folders-0.5.1-py3-none-any.whl (8.4 kB) Installing collected packages: split-folders Successfully installed split-folders-0.5.1
from google.colab import drive
drive.mount('/content/drive')
import splitfolders
# Define the train/validation/test ratio
splitfolders.ratio("/content/drive/MyDrive/Colab Notebooks/Project_Data",
output="my_dataset", seed=1, ratio=(.6, .2, .2), group_prefix=None)
data_transform = transforms.Compose( [transforms.Resize((224,224)),
transforms.ToTensor()])
# Create the three data sets
train_data = datasets.ImageFolder('my_dataset/train', transform=data_transform)
validation_data = datasets.ImageFolder('my_dataset/val', transform=data_transform)
test_data = datasets.ImageFolder('my_dataset/test', transform=data_transform)
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Copying files: 1669 files [00:09, 183.62 files/s]
We have defined the standard train and get_accuracy functions, which will help us compute the forward pass, backward pass, and the accuracies for each epoch. After each epoch, the accuracies and loss are recorded so that they can be plotted after the training is complete.
def train(model, train_data, val_data, batch_size=64, learning_rate =0.01, num_epochs=1):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
train_loader=torch.utils.data.DataLoader(train_data, batch_size=30, shuffle=True)
val_loader=torch.utils.data.DataLoader(val_data, batch_size=30, shuffle=True)
iters, losses, train_acc, val_acc = [], [], [], []
# training
for epoch in range(num_epochs):
for imgs, labels in iter(train_loader):
#############################################
#To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
#############################################
out = model(imgs) # forward pass
loss = criterion(out, labels) # compute the total loss
loss.backward() # backward pass (compute parameter updates)
optimizer.step() # make the updates for each parameter
optimizer.zero_grad() # a clean up step for PyTorch
# save the current training information
iters.append(epoch)
losses.append(float(loss)/batch_size) # compute *average* loss
train_accuracy=get_accuracy(model,train_loader)
val_accuracy=get_accuracy(model,val_loader)
print('Epoch:',epoch,'Train Accuracy:',train_accuracy,'Validation Accuracy:',val_accuracy)
train_acc.append(get_accuracy(model, train_loader)) # compute training accuracy
val_acc.append(get_accuracy(model, val_loader)) # compute validation accuracy
# plotting
plt.title("Training Curve")
plt.plot(iters, losses, label="Train")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()
plt.title("Training Curve")
plt.plot(iters, train_acc, label="Train")
plt.plot(iters, val_acc, label="Validation")
plt.xlabel("Iterations")
plt.ylabel("Training Accuracy")
plt.legend(loc='best')
plt.show()
print("Final Training Accuracy: {}".format(train_acc[-1]))
print("Final Validation Accuracy: {}".format(val_acc[-1]))
def get_accuracy(model, data_loader):
correct = 0
total = 0
for imgs, labels in data_loader:
#############################################
#To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
#############################################
output = model(imgs)
#select index with maximum prediction score
pred = output.max(1, keepdim=True)[1]
correct += pred.eq(labels.view_as(pred)).sum().item()
total += imgs.shape[0]
return correct / total
The first strategy was to develop a custom CNN network for feature extraction and classification. The following model has two convolutional layers and two fully connected layers. Between convolutional layers, max-pooling was applied. The figure below summarizes the architecture.
class CNN(nn.Module):
def __init__(self):
super(CNN,self).__init__()
self.conv1=nn.Conv2d(3,5,5) # in channels, out channels, kernel size
self.pool=nn.MaxPool2d(2,2) # kernel size, stride
self.conv2=nn.Conv2d(5,10,5) # in channels, out channels, kernel size
self.fc1=nn.Linear(10*53*53,32) # 1st fully connected layer
self.fc2=nn.Linear(32,5) # 2nd fully connected layer
def forward(self, x): #x is the input
x=self.pool(F.relu(self.conv1(x))) # 1st convolution layer + ReLU + maxpooling
x=self.pool(F.relu(self.conv2(x))) # 2nd convolution layer + ReLU + maxpooling
x=x.view(-1,10*53*53) # flattens the feature maps
x=torch.relu(self.fc1(x)) # 1st fully connected layer + ReLU
x=self.fc2(x) # 2nd fully connected layer
return x
We'll now train the first CNN model with the data. The hyperparameters are as follows:
batch size: 64
learning rate: 0.005
number of epochs: 8
use_cuda=True
model=CNN()
if use_cuda and torch.cuda.is_available():
model=model.cuda()
print('CUDA is available. Training on GPU')
else:
print('CUDA is not available. Training on CPU')
start_time = time.time()
train(model, train_data, validation_data, batch_size=64, learning_rate=0.005, num_epochs=8)
end_time_CNN_1 = time.time() - start_time
# print("The CNN takes ", end_time_CNN_1," seconds to train")
CUDA is available. Training on GPU Epoch: 0 Train Accuracy: 0.244 Validation Accuracy: 0.22289156626506024 Epoch: 1 Train Accuracy: 0.368 Validation Accuracy: 0.3493975903614458 Epoch: 2 Train Accuracy: 0.455 Validation Accuracy: 0.42771084337349397 Epoch: 3 Train Accuracy: 0.491 Validation Accuracy: 0.45180722891566266 Epoch: 4 Train Accuracy: 0.474 Validation Accuracy: 0.4427710843373494 Epoch: 5 Train Accuracy: 0.394 Validation Accuracy: 0.3825301204819277 Epoch: 6 Train Accuracy: 0.556 Validation Accuracy: 0.49698795180722893 Epoch: 7 Train Accuracy: 0.479 Validation Accuracy: 0.39457831325301207
Final Training Accuracy: 0.494 Final Validation Accuracy: 0.4819277108433735
The final training accuracy is about 49%, and the validation accuracy is 48%. While this indicates that the model is trainable to this particular dataset, further improvements can be made.
This prompted the use of Transfer Learning with AlexNet. The pretrained weights may noticeably improve the results. Should this be the case, expanding the project to add more classes is a possibility.
First, our image data is transformed by resizing to 224x224 pixels and converted into tensors. We iterate through the training, validation and test datasets to extract features using the pretrained AlexNet model's convolutional layers. These extracted features (tensors) are saved locally. This avoids re-computing features each time the model is ran. which helps to free up GPU memory and save on training and evaluating time.
# TRANSFER LEARNING with ALEXNET
import torchvision.models
alexnet = torchvision.models.alexnet(pretrained=True) # weights
import numpy as np
np.random.seed(1000)
torch.manual_seed(1000)
<torch._C.Generator at 0x79ea5f705530>
import os
import splitfolders
from google.colab import drive
drive.mount('/content/drive')
splitfolders.ratio("/content/drive/MyDrive/Colab Notebooks/Project_Data", output="my_dataset", seed=1, ratio=(.6, .2, .2), group_prefix=None)
from torchvision import datasets, transforms
data_transform = transforms.Compose( [transforms.Resize((224,224)),
transforms.ToTensor()])
train_data = datasets.ImageFolder('my_dataset/train', transform=data_transform)
validation_data = datasets.ImageFolder('my_dataset/val', transform=data_transform)
test_data = datasets.ImageFolder('my_dataset/test', transform=data_transform)
batch_size = 1
num_workers = 0
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
classes=['Baroque','Cubism','Minimalism','Popart','Ukiyo']
#training
i = 0
for img, label in train_loader:
features = alexnet.features(img)
features_tensor = torch.from_numpy(features.detach().numpy())
save_features_dir = '/content/my_dataset/train/'+ str(classes[label]) + '/'
if not os.path.isdir(save_features_dir):
os.mkdir(save_features_dir)
torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
i += 1
#validation
i = 0
for img, label in val_loader:
features = alexnet.features(img)
features_tensor = torch.from_numpy(features.detach().numpy())
save_features_dir = '/content/my_dataset/val/'+ str(classes[label]) + '/'
if not os.path.isdir(save_features_dir):
os.mkdir(save_features_dir)
torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
i += 1
#testing
i = 0
for img, label in test_loader:
features = alexnet.features(img)
features_tensor = torch.from_numpy(features.detach().numpy())
save_features_dir = '/content/my_dataset/test/'+ str(classes[label]) + '/'
if not os.path.isdir(save_features_dir):
os.mkdir(save_features_dir)
torch.save(features_tensor.squeeze(0), save_features_dir + str(i) + '.tensor')
i += 1
Mounted at /content/drive
Copying files: 1669 files [01:05, 25.50 files/s]
To verify the size of the input to the model, the size of the feature tensor was printed. The feature tensors have 256 number of channels and a size of 6x6. The CNN models will need to also have in_channels of 256 to match the feature tensor size. Due to the 6x6 size of the tensors, there are limitations in the complexity of the architecture. To elaborate:
print(features.size())
torch.Size([1, 256, 6, 6])
The class_to_idx method was called on the data loader to verify the indices assigned to each class. This will be used later when implmenting the model on new data.
print(train_data.class_to_idx)
{'Baroque': 0, 'Cubism': 1, 'Minimalism': 2, 'Popart': 3, 'Ukiyo': 4}
We created a CNN model that takes AlexNet features and consists of two convolutional layers, one pooling layer, and two fully connected layers. The loss we are trying to minimize is multi-class cross entropy loss as shown by the equation. $$ L(y, p) = - \sum_{i=1}^{N} y_i \log(p_i) $$ The last fully connected layer maps the features into 5 outputs, corresponding to the number of art styles we have. This choice of model architecture allows us to classify images into multiple categories thus allowing us to classify different art styles. Lastly, we tuned our hyperparameters. We tried out different combinations of batch size, learning rate, and the number of epochs during training to balance between training time and accuracy, while preventing overfitting.
We will test three different CNN models and select the model with the highest accuracy. First, we are using alexnet_CNN which has the architecture shown in the image below. It has a total of 33,491 parameters.
class alexnet_CNN(nn.Module):
def __init__(self):
super(alexnet_CNN, self).__init__()
self.conv1 = nn.Conv2d(256, 80, 3) #in_channels, out_chanels, kernel_size
self.conv2 = nn.Conv2d(80, 15, 3) #in_channels, out_chanels, kernel_size
self.fc1 = nn.Linear((15 * 2 * 2), 32)
self.fc2 = nn.Linear(32, 5) # 5 is number of classes
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = x.view(-1, 15 * 2 * 2)#
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))
use_cuda=True
model_an = alexnet_CNN()
if use_cuda and torch.cuda.is_available():
model_an = model_an.cuda()
print('CUDA is available. Training on GPU')
else:
print('CUDA is not available. Training on CPU')
start_time=time.time()
train(model_an, train_data_features,val_data_features, batch_size=16, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_CNN_2=end_time-start_time
print('The CNN takes',total_time_CNN_2,'seconds to train')
CUDA is available. Training on GPU Epoch: 0 Train Accuracy: 0.329 Validation Accuracy: 0.3253012048192771 Epoch: 1 Train Accuracy: 0.355 Validation Accuracy: 0.3373493975903614 Epoch: 2 Train Accuracy: 0.431 Validation Accuracy: 0.4246987951807229 Epoch: 3 Train Accuracy: 0.554 Validation Accuracy: 0.5 Epoch: 4 Train Accuracy: 0.714 Validation Accuracy: 0.6746987951807228 Epoch: 5 Train Accuracy: 0.796 Validation Accuracy: 0.7379518072289156 Epoch: 6 Train Accuracy: 0.831 Validation Accuracy: 0.7439759036144579 Epoch: 7 Train Accuracy: 0.856 Validation Accuracy: 0.7891566265060241 Epoch: 8 Train Accuracy: 0.869 Validation Accuracy: 0.7981927710843374 Epoch: 9 Train Accuracy: 0.881 Validation Accuracy: 0.8042168674698795 Epoch: 10 Train Accuracy: 0.894 Validation Accuracy: 0.8192771084337349 Epoch: 11 Train Accuracy: 0.897 Validation Accuracy: 0.8072289156626506 Epoch: 12 Train Accuracy: 0.922 Validation Accuracy: 0.8253012048192772 Epoch: 13 Train Accuracy: 0.893 Validation Accuracy: 0.8283132530120482 Epoch: 14 Train Accuracy: 0.939 Validation Accuracy: 0.8343373493975904
Final Training Accuracy: 0.939 Final Validation Accuracy: 0.8343373493975904 The CNN takes 30.856274127960205 seconds to train
alexnet_CNN overfits to the training data as shown in the training curve above. The final training accuracy is 0.956 while the final validation accuracy is 0.846 and plateaus after around 10 epochs. To address this overfitting, we will try OtherAlexCNN which has more parameters of 225,657 vs. 33,491. The architecture is illustrated below.
class OtherAlexCNN(nn.Module):
def __init__(self, kernel_size=3):
super(OtherAlexCNN, self).__init__()
self.conv1 = nn.Conv2d(256, 100, kernel_size) #in_channels, out_chanels, kernel_size
self.pool = nn.MaxPool2d(2, 2) #kernel_size, stride
self.conv2 = nn.Conv2d(100, 50, kernel_size) #in_channels, out_chanels, kernel_size
self.fc1 = nn.Linear(50*2*2, 32)
self.fc2 = nn.Linear(32, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = x.view(-1, 200)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = x.squeeze(1)
return x
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))
use_cuda=True
model_can=OtherAlexCNN()
if use_cuda and torch.cuda.is_available():
model_can=model_can.cuda()
print('CUDA is available. Training on GPU')
else:
print('CUDA is not available. Training on CPU')
start_time=time.time()
train(model_can, train_data_features,val_data_features, batch_size=16, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_CNN_3=end_time-start_time
print('The CNN takes',total_time_CNN_3,'seconds to train')
CUDA is available. Training on GPU Epoch: 0 Train Accuracy: 0.284 Validation Accuracy: 0.2680722891566265 Epoch: 1 Train Accuracy: 0.326 Validation Accuracy: 0.26506024096385544 Epoch: 2 Train Accuracy: 0.531 Validation Accuracy: 0.5180722891566265 Epoch: 3 Train Accuracy: 0.675 Validation Accuracy: 0.6295180722891566 Epoch: 4 Train Accuracy: 0.754 Validation Accuracy: 0.7018072289156626 Epoch: 5 Train Accuracy: 0.8 Validation Accuracy: 0.75 Epoch: 6 Train Accuracy: 0.814 Validation Accuracy: 0.7620481927710844 Epoch: 7 Train Accuracy: 0.86 Validation Accuracy: 0.7921686746987951 Epoch: 8 Train Accuracy: 0.863 Validation Accuracy: 0.8012048192771084 Epoch: 9 Train Accuracy: 0.878 Validation Accuracy: 0.8192771084337349 Epoch: 10 Train Accuracy: 0.892 Validation Accuracy: 0.8313253012048193 Epoch: 11 Train Accuracy: 0.904 Validation Accuracy: 0.822289156626506 Epoch: 12 Train Accuracy: 0.902 Validation Accuracy: 0.822289156626506 Epoch: 13 Train Accuracy: 0.923 Validation Accuracy: 0.8162650602409639 Epoch: 14 Train Accuracy: 0.934 Validation Accuracy: 0.8162650602409639
Final Training Accuracy: 0.934 Final Validation Accuracy: 0.8162650602409639 The CNN takes 16.094091653823853 seconds to train
The model OtherAlexCNN is able to achieve a higher validation accuracy result of 0.837 but it is still overfitting to the training data as it plateaus around 10 epochs. To address this overfitting we will try using drop out layers.
After trying different drop out probabilities, we found that using a drop out probability of 0.2, batch_size of 64, learning_rate of 0.001 and num_epochs of 15 resulted in the best results. The figure below shows the dropout architecture.
class AlexCNN_dropout(nn.Module):
def __init__(self, kernel_size=3,dropout_prob=0.2):
super(AlexCNN_dropout, self).__init__()
self.conv1 = nn.Conv2d(256, 100, kernel_size) #in_channels, out_chanels, kernel_size
self.pool = nn.MaxPool2d(2, 2) #kernel_size, stride
self.conv2 = nn.Conv2d(100, 100, kernel_size) #in_channels, out_chanels, kernel_size
self.fc1 = nn.Linear(100*2*2, 32)
self.dropout = nn.Dropout(p=dropout_prob)
self.fc2 = nn.Linear(32, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.dropout(x)
x = F.relu(self.conv2(x))
x = self.dropout(x)
x = x.view(-1, 100 * 2 * 2)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
x = x.squeeze(1)
return x
train_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/train/', loader=torch.load, extensions=('.tensor'))
val_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/val/', loader=torch.load, extensions=('.tensor'))
test_data_features = torchvision.datasets.DatasetFolder('/content/my_dataset/test/', loader=torch.load, extensions=('.tensor'))
use_cuda=True
model_d=AlexCNN_dropout()
if use_cuda and torch.cuda.is_available():
model_d=model_d.cuda()
print('CUDA is available. Training on GPU')
else:
print('CUDA is not available. Training on CPU')
start_time=time.time()
train(model_d, train_data_features,val_data_features, batch_size=64, learning_rate=0.001,num_epochs=15)
end_time=time.time()
total_time_d=end_time-start_time
CUDA is available. Training on GPU Epoch: 0 Train Accuracy: 0.257 Validation Accuracy: 0.2469879518072289 Epoch: 1 Train Accuracy: 0.264 Validation Accuracy: 0.2680722891566265 Epoch: 2 Train Accuracy: 0.378 Validation Accuracy: 0.3674698795180723 Epoch: 3 Train Accuracy: 0.581 Validation Accuracy: 0.5572289156626506 Epoch: 4 Train Accuracy: 0.669 Validation Accuracy: 0.608433734939759 Epoch: 5 Train Accuracy: 0.708 Validation Accuracy: 0.6596385542168675 Epoch: 6 Train Accuracy: 0.747 Validation Accuracy: 0.6987951807228916 Epoch: 7 Train Accuracy: 0.786 Validation Accuracy: 0.7259036144578314 Epoch: 8 Train Accuracy: 0.799 Validation Accuracy: 0.7560240963855421 Epoch: 9 Train Accuracy: 0.838 Validation Accuracy: 0.7620481927710844 Epoch: 10 Train Accuracy: 0.806 Validation Accuracy: 0.7891566265060241 Epoch: 11 Train Accuracy: 0.859 Validation Accuracy: 0.8162650602409639 Epoch: 12 Train Accuracy: 0.878 Validation Accuracy: 0.8192771084337349 Epoch: 13 Train Accuracy: 0.891 Validation Accuracy: 0.8102409638554217 Epoch: 14 Train Accuracy: 0.894 Validation Accuracy: 0.8192771084337349
Final Training Accuracy: 0.906 Final Validation Accuracy: 0.8283132530120482
Using drop out layers definitely addresses the overfitting issue compared to the previous models that had larger gaps in the training and validation accuracies. The final training accuracy is 0.917 and the validation accuracy is 0.843. However it is a tradeoff as the final validation accuracy is lower.
We picked the best model that gave the highest validation accuracy results from the various CNN models that we have tried. Below is a table showing a summary of the various CNN models. alexnet-CNN resulted in the best results and was used to test on the test dataset.
import matplotlib.pyplot as plt
import pandas as pd
models_data = [
{"Model": "alexnet_CNN", "Train Acc": 0.939, "Val Acc": 0.834, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 16, "Parameters": 33491},
{"Model": "OtherAlexCNN", "Train Acc": 0.934, "Val Acc": 0.816, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 16, "Parameters": 225657},
{"Model": "AlexCNN_dropout", "Train Acc": 0.906, "Val Acc": 0.828, "Epochs": 15, "Learning Rate": 0.001, "Batch Size": 64, "Parameters": 207857},
]
df = pd.DataFrame(models_data)
fig, ax = plt.subplots(figsize=(10, 5))
ax.axis('off')
table = ax.table(cellText=df.values, colLabels=df.columns, loc='center', cellLoc='center', colColours=["#ADD8E6"] * len(df.columns))
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.5, 2.5)
plt.show()
# test alex net on test dataset
test_loader=torch.utils.data.DataLoader(test_data_features, batch_size=64, num_workers=num_workers,shuffle=True)
get_accuracy(model_an, test_loader)
0.8367952522255193
The test accuracy obtained is 83.7 %. We will further analyze our test results below.
Comparing the running times of all of the different models that we tried are summarized in a table below. CNN using generated features is much slower at 1000 seconds compared to using a CNN model with pretrained AlexNet features of around 16 seconds. This shows the computational benefits of using pretrained features.
models = ['CNN', 'alexnet_CNN', 'OtherAlexCNN', 'AlexCNN_dropout']
running_times = [end_time_CNN_1, total_time_CNN_2, total_time_CNN_3, total_time_d]
running_times_rounded = [round(time, 3) for time in running_times]
# Create a DataFrame
df_running_times = pd.DataFrame({"Model": models, "Running Time (seconds)": running_times_rounded})
# Display the table using Matplotlib
fig, ax = plt.subplots(figsize=(8, 4))
ax.axis('off')
table = ax.table(cellText=df_running_times.values,
colLabels=df_running_times.columns,
loc='center',
cellLoc='center',
colColours=["#ADD8E6"] * len(df_running_times.columns),
cellColours=[["w"] * len(df_running_times.columns) for _ in range(len(df_running_times))])
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.5, 2.5)
plt.show()
To further interpret the test results and identify which images were incorrectly labelled, we displayed the images to see if that could help us in identifying a pattern. get_accuracy was used to display the images that were incorrectly labelled in the test dataset with the predicted label and the target label.
def get_accuracy(model, data_loader, classes):
correct = 0
total = 0
batch_size = data_loader.batch_size
test_loader_orig = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
num_workers=num_workers, shuffle=False)
dataiter = iter(test_loader_orig)
images, labels = next(dataiter)
images = images.numpy() # convert images to numpy for display
fig = plt.figure(figsize=(25, 4))
for i, (imgs, labels) in enumerate(data_loader):
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
output = model(imgs)
pred = output.max(1, keepdim=True)[1]
is_correct = pred.eq(labels.view_as(pred))
correct += is_correct.sum().item()
total += imgs.shape[0]
incorrect_indices = (~is_correct).nonzero(as_tuple=True)[0]
print(incorrect_indices.shape)
for idx in range(incorrect_indices.shape[0]):
predicted_label = classes[pred[incorrect_indices[idx]][0]]
actual_label = classes[labels[incorrect_indices[idx]]]
print(f"Image Index: {incorrect_indices[idx]}, Predicted Label: {predicted_label}, Actual Label: {actual_label}")
#ax = fig.add_subplot(327, 1, idx + 1, xticks=[], yticks=[])
plt.imshow(np.transpose(images[incorrect_indices[idx]], (1, 2, 0)))
#ax.set_title(actual_label)
plt.show()
return correct / total
class_labels = ['Baroque', 'Cubism', 'Minimalism', 'Popart', 'Ukiyo']
test_loader = torch.utils.data.DataLoader(test_data_features, batch_size=327, num_workers=num_workers, shuffle=False)
get_accuracy(model_an, test_loader, classes)
torch.Size([55]) Image Index: 0, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 12, Predicted Label: Cubism, Actual Label: Baroque
Image Index: 20, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 33, Predicted Label: Ukiyo, Actual Label: Baroque
Image Index: 57, Predicted Label: Popart, Actual Label: Baroque
Image Index: 65, Predicted Label: Popart, Actual Label: Baroque
Image Index: 93, Predicted Label: Popart, Actual Label: Cubism
Image Index: 95, Predicted Label: Popart, Actual Label: Cubism
Image Index: 97, Predicted Label: Popart, Actual Label: Cubism
Image Index: 101, Predicted Label: Minimalism, Actual Label: Cubism
Image Index: 112, Predicted Label: Ukiyo, Actual Label: Cubism
Image Index: 127, Predicted Label: Popart, Actual Label: Cubism
Image Index: 128, Predicted Label: Popart, Actual Label: Cubism
Image Index: 136, Predicted Label: Baroque, Actual Label: Cubism
Image Index: 138, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 140, Predicted Label: Baroque, Actual Label: Minimalism
Image Index: 142, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 159, Predicted Label: Cubism, Actual Label: Minimalism
Image Index: 175, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 184, Predicted Label: Ukiyo, Actual Label: Minimalism
Image Index: 202, Predicted Label: Popart, Actual Label: Minimalism
Image Index: 210, Predicted Label: Cubism, Actual Label: Popart
Image Index: 212, Predicted Label: Baroque, Actual Label: Popart
Image Index: 213, Predicted Label: Cubism, Actual Label: Popart
Image Index: 215, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 216, Predicted Label: Cubism, Actual Label: Popart
Image Index: 217, Predicted Label: Cubism, Actual Label: Popart
Image Index: 220, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 221, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 223, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 230, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 236, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 238, Predicted Label: Cubism, Actual Label: Popart
Image Index: 241, Predicted Label: Cubism, Actual Label: Popart
Image Index: 243, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 244, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 248, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 249, Predicted Label: Cubism, Actual Label: Popart
Image Index: 250, Predicted Label: Cubism, Actual Label: Popart
Image Index: 251, Predicted Label: Cubism, Actual Label: Popart
Image Index: 255, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 256, Predicted Label: Cubism, Actual Label: Popart
Image Index: 258, Predicted Label: Cubism, Actual Label: Popart
Image Index: 259, Predicted Label: Cubism, Actual Label: Popart
Image Index: 260, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 261, Predicted Label: Minimalism, Actual Label: Popart
Image Index: 262, Predicted Label: Cubism, Actual Label: Popart
Image Index: 265, Predicted Label: Ukiyo, Actual Label: Popart
Image Index: 268, Predicted Label: Cubism, Actual Label: Ukiyo
Image Index: 277, Predicted Label: Minimalism, Actual Label: Ukiyo
Image Index: 295, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 305, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 312, Predicted Label: Cubism, Actual Label: Ukiyo
Image Index: 322, Predicted Label: Popart, Actual Label: Ukiyo
Image Index: 324, Predicted Label: Minimalism, Actual Label: Ukiyo
torch.Size([0])
0.8367952522255193
The main pattern that we noticed from viewing all the misclassified images was that the incorrect predictions were mainly for 'Pop-art' and 'Cubism' classes. One of the reasons for this could be due to visual similarities in the two art styles. Both images use vibrant and primary colours, prominent straight and curvy lines, unconventional shapes, and fragmented compositions.
For Cubism in particular, this shows an area where the model could be confused. This confusion could stem from the two forms of Cubism that exist within the Cubism movement: Analytical Cubism and Synthetic Cubism. Analytical Cubism is defined by muted colors and complex planes, while Synthetic Cubism is defined by bright colors, simpler shapes, and even collaged elements [1]. The misclassification may be due to the styles having overlapping features. The misclassified Synthetic Cubist art had brighter colors and collage elements, which could be mistaken for the characteristics of pop art. On the other hand, if the artwork had a muted color palette and complex details, it could be misclassified as Baroque.
This suggests that while our model can identify broader art styles, we will need additional training data with more distinct labeling for the model to be able to identify sub-styles effectively.
To further analyze our results, we calculated a confusion matrix to see the true performance for the different classes. get_metrics_per_class function was used to calculate the F1, Precision, and Recall scores as shown in the equations below and plot the corresponding confusion matrix.
TP=True Positive, TN=True Negative, FP=False Positive, FN=False Negative
def get_metrics_per_class(model, data_loader, class_labels):
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for imgs, labels in data_loader:
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
output = model(imgs)
_, predicted = torch.max(output, 1)
y_true.extend(labels.cpu().numpy())
y_pred.extend(predicted.cpu().numpy())
f1_scores = f1_score(y_true, y_pred, average=None)
precision, recall, _, _ = precision_recall_fscore_support(y_true, y_pred, average=None)
table_data = []
headers = ["Class", "F1 Score", "Precision", "Recall"]
for class_idx, class_label in enumerate(class_labels):
table_data.append([class_label, round(f1_scores[class_idx], 3), round(precision[class_idx], 3), round(recall[class_idx], 3)])
fig, ax = plt.subplots(figsize=(8, 4))
ax.axis('off')
table = ax.table(cellText=table_data, colLabels=headers, loc="center", cellLoc="center",
colColours=["#ADD8E6"] * len(headers))
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.5, 2)
plt.show()
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
class_labels = ['Baroque', 'Cubism', 'Minimalism', 'Popart', 'Ukiyo']
num_classes = len(class_labels)
get_metrics_per_class(model_an, test_loader, class_labels)
Findings from the Confusion Matrix:
We can see that the confusion matrix reflects our findings from visualizing the incorrectly labelled data. Baroque, Minimalism, and Ukiyo have the most instances of the correctly labelled images which are shown in the diagonal entries of the confusion matrix (True Positive). Some Baroque images were confused with Popart and Ukiyo as shown in the off-diagonal entries of the confusion matrix (False Positive).
Our findings from visualizing the incorrectly labelled images that Cubism and Popart were the most misclassified classes were true. The confusion matrix shows that Cubism has the highest False Positives with Popart but Popart has the highest False Positives with Minimalism.
Findings from the Recall, Precision Scores:
Another finding is that Cubism has a low precision score but a high recall score. The low precision score tells us that the model is lenient in making positive predictions and as a result predicts a lot of positives including many false positives. This leads to having a higher recall score because the model is able to capture a significant portion of the true positive instances.
To demonstrate that the model is generalizable and can correctly identify unseen images, new paintings needed to be sourced from a third-party. The Art Gallery of Ontario (AGO) has an online collection with art from a variety of artists and styles. Five new paintings were collected to represent each of the classes in the model. The image URLs were taken directly from the AGO's website.
The AGO also provided the labels for the data. Searching "Baroque" or "Cubism" in there online database provided many examples that aligned with the classes we wished to identify.
# Define the list of image URLs that will be used to prove the effectiveness of the model.
urls = [
'https://dbi5a5cdy48wt.cloudfront.net/loris/co10/ago.6277.jp2/full/680,/0/default.jpg', # Baroque
'https://dbi5a5cdy48wt.cloudfront.net/loris/co7/ago.48945.jp2/full/680,/0/default.jpg', # Pop_art
'http://imagelicensing.ago.ca/internal/media/dispatcher/144133/preview', # Ukiyo
'https://dbi5a5cdy48wt.cloudfront.net/loris/co5/6510.jp2/full/680,/0/default.jpg', # Cubism
'https://dbi5a5cdy48wt.cloudfront.net/loris/co13/23127.jp2/full/680,/0/default.jpg' # Minimalism
]
labels = [
0, # Baroque
3, # Pop_art
4, # Ukiyo
1, # Cubism
2 # Minimalism
]
A series of helper functions are defined to process the images and obtain the features and predictions. The helper functions are categorized as: functions that help train model and output the classifications, and functions that process the images from the URLs.
def get_prediction(model, data):
'''
A function that obtains the predicited labels on the new data.
:param model: The trained AlexNet model
:param data: The data loader that contains the AlexNet features for the new paintings.
:return pred_class: a list of the predicted classes (in terms of their names).
'''
correct = 0
total = 0
for imgs, labels in data:
str_labels = idx_to_class(labels)
print(str_labels)
imgs = imgs.squeeze()
#############################################
#To Enable GPU Usage
if use_cuda and torch.cuda.is_available():
imgs = imgs.cuda()
labels = labels.cuda()
#############################################
output = model(imgs)
#select index with maximum prediction score
pred = output.max(1, keepdim=True)[1]
# print(pred)
correct += pred.eq(labels.view_as(pred)).sum().item()
total += imgs.shape[0]
pred_class = idx_to_class(pred)
return pred_class
def get_torch_vars(xs, ys, gpu=False):
'''
A function that converts inputs and labels to their torch equivalent.
:param xs: the input data
:param ys: the label data
:returns xs, ys: the torch versions of the input and label
'''
xs = torch.from_numpy(xs).float()
ys = torch.from_numpy(ys).float()
if gpu:
xs = xs.cuda()
ys = ys.cuda()
return xs, ys
def get_features(image):
'''
Obtains the AlexNet features
:param image: The new image data (torch)
:returns features_tensor: The AlexNet features for the specified image.
'''
features = alexnet.features(image)
features_tensor = torch.from_numpy(features.detach().numpy())
return features_tensor
def idx_to_class(pred):
'''
A function that converts the class indices to strings
:param pred: the class indices returned by the model.
:returns pred_class: a list of the string labels associated with the predictions.
'''
pred_class = []
for p in pred:
if p == 0:
pred_class.append("Baroque")
elif p == 1:
pred_class.append("Cubism")
elif p == 2:
pred_class.append("Minimalism")
elif p == 3:
pred_class.append("Pop_art")
elif p == 4:
pred_class.append("Ukiyo")
return pred_class
from PIL import Image
import requests
def get_img(url_list):
'''
:param url_list: The image URLs from the AGO
:returns a numpy array of the retrieved images
'''
images = []
for url in url_list:
image = Image.open(requests.get(url, stream=True).raw)
image = image.resize((224, 224))
image = np.transpose(np.array(image), (2, 0, 1))
image = new_data_process(image)
images.append(image)
return np.array(images)
def new_data_process(xs, max_pixel=256.0):
'''
Normalizes the images
'''
xs = xs / max_pixel
return xs
def visual(img):
'''
Prints out the list of images
'''
img = np.transpose(img[:5, :, :, :], [0, 2, 3, 1])
for i in range(5):
ax = plt.subplot(3, 5, i + 1)
ax.imshow(img[i])
ax.axis("off")
plt.show()
The image URLs are process and the AlexNet features are extracted. As seen below, the model was able to successfully identify the art style for all five images provided, which indicates that the network is successful in classifying unseen data.
images = get_img(urls)
visual(images)
labels_np = np.array(labels)
img_torch, label_torch = get_torch_vars(images, labels_np)
dataset = []
for i in range(len(images)):
alex_img = get_features(img_torch[i])
dataset.append((alex_img, label_torch[i]))
loader = torch.utils.data.DataLoader(dataset, batch_size=len(images),
num_workers=num_workers, shuffle=False)
get_prediction(model_an, loader)
['Baroque', 'Pop_art', 'Ukiyo', 'Cubism', 'Minimalism']
['Baroque', 'Pop_art', 'Ukiyo', 'Cubism', 'Minimalism']
We found several projects in the realm of utilizing convolutional neural networks for the classification of artworks. One paper titled “Using Convolutional Neural Networks to Classify Art Genre” used CNN without transfer learning to classify art styles. Similar to our project, their model was also the most successful in identifying Baroque artworks among all the styles. Their model achieved 94% accuracy in identifying Baroque artworks with an overall test accuracy of 81% [2].
Another project we found was “Artist Identification with Convolutional Neural Networks”, where the authors tested a variety of models from a CNN to a ResNet-18 network with transfer learning. Their best result came from a network based on ResNet-18 pre-trained on ImageNet with transfer learning, yielding a test accuracy of 89.8% [3]. The projects all highlighted the potential of CNNs and transfer learning in the field of art classification and the challenges they face, particularly with limited data and unbalanced data across styles.
To further improve the results of this project, there are some recommendations and next steps.
First, training on a larger dataset is key. Perhaps in the future, scaling up to 10-15 art styles with at least 300 images per style would improve the applicability of this project.
The AlexNet model could be improved by using more layers and convolutions, instead of the two-convolution-two-fully-connected architecture we used earlier. However, there are limitations on the number of layers since AlexNet relies on 224x224 pixel images.
The reliance on AlexNet can be circumvented by developing our own CNN model further. One challenge with this is that training from scratch requires much computational power and time. However, such a model is no longer constrained by the size of AlexNet.
[1] Tate, “Cubism,” Tate. Accessed: Dec. 08, 2023. [Online]. Available: https://www.tate.org.uk/art/art-terms/c/cubism
[2] J. DuBois, “Using Convolutional Neural Networks to Classify Art Genre”.
[3] N. Viswanathan, “Artist Identification with Convolutional Neural Networks”.
%%shell
jupyter nbconvert --to html /content/MIE1517_team7_final_report.ipynb
[NbConvertApp] Converting notebook /content/MIE1517_team7_final_report.ipynb to html [NbConvertApp] Writing 20706022 bytes to /content/MIE1517_team7_final_report.html